我们提出了一种新的“泊松流”生成模型(PFGM),该模型将高维半球上的均匀分布映射到任何数据分布中。我们将数据点解释为$ z = 0 $超平面上的电荷,在增加额外尺寸$ z $的空间中,产生了高维电场(泊松方程解决方案的梯度)。我们证明,如果这些电荷沿电场线向上流动,则它们在$ z = 0 $平面中的初始分布将变成半径$ r $半球的分布,该分布在$ r \ to \ infty $限制中变成均匀。为了学习徒的转化,我们估计了增强空间中的归一化场。对于采样,我们设计了一种由物理上有意义的附加尺寸锚定的向后ode:当$ z $达到零时,样本击中了未加重的数据歧管。在实验上,PFGM在CIFAR-10上的正常流量模型中实现了当前的最新性能,其成立分数为9.68美元,而FID得分为2.48美元。它还可以与最先进的SDE方法相同,同时提供$ 10 \ times $至$ 20 \ $ 20 \ times $ $加速图像生成任务。此外,PFGM在较弱的网络体系结构上似乎更宽容估计误差,并且对Euler方法中的步骤大小稳健。该代码可在https://github.com/newbeeer/poisson_flow上找到。
translated by 谷歌翻译
有损压缩和聚类的核心是学习代表的忠诚度和规模之间的权衡。我们的目标是绘制并研究量化此权衡的帕累托前沿。我们关注确定性信息瓶颈(DIB)目标在硬聚类空间上的优化。为此,我们介绍了原始的DIB问题,当优化在离散搜索空间上时,我们显示出比以前研究的拉格朗日放松的最丰富的领域。我们提出了一种算法,用于绘制原始DIB权衡的Pareto前沿,该算法也适用于其他两种目标聚类问题。我们研究了帕累托边境的一般特性,并提供了总体上对数稀疏性的分析和数值证据。我们提供的证据表明,尽管有超过指数的搜索空间,但我们的算法具有多项式缩放,此外,我们提出了对算法的修改,该算法可以在预期采样噪声显着的情况下使用。最后,我们使用算法来绘制三个不同任务的DIB前沿:压缩英语字母,从自然图像中提取信息性的颜色类别,并压缩了一个以群体理论为灵感的数据集,揭示了Frontier的有趣特征,并演示了如何展示的结构。边界可用于模型选择,重点是先前由凸船斗篷隐藏的点。
translated by 谷歌翻译
With the advent of Neural Style Transfer (NST), stylizing an image has become quite popular. A convenient way for extending stylization techniques to videos is by applying them on a per-frame basis. However, such per-frame application usually lacks temporal-consistency expressed by undesirable flickering artifacts. Most of the existing approaches for enforcing temporal-consistency suffers from one or more of the following drawbacks. They (1) are only suitable for a limited range of stylization techniques, (2) can only be applied in an offline fashion requiring the complete video as input, (3) cannot provide consistency for the task of stylization, or (4) do not provide interactive consistency-control. Note that existing consistent video-filtering approaches aim to completely remove flickering artifacts and thus do not respect any specific consistency-control aspect. For stylization tasks, however, consistency-control is an essential requirement where a certain amount of flickering can add to the artistic look and feel. Moreover, making this control interactive is paramount from a usability perspective. To achieve the above requirements, we propose an approach that can stylize video streams while providing interactive consistency-control. Apart from stylization, our approach also supports various other image processing filters. For achieving interactive performance, we develop a lite optical-flow network that operates at 80 Frames per second (FPS) on desktop systems with sufficient accuracy. We show that the final consistent video-output using our flow network is comparable to that being obtained using state-of-the-art optical-flow network. Further, we employ an adaptive combination of local and global consistent features and enable interactive selection between the two. By objective and subjective evaluation, we show that our method is superior to state-of-the-art approaches.
translated by 谷歌翻译
In this work, we address the problem of unsupervised moving object segmentation (MOS) in 4D LiDAR data recorded from a stationary sensor, where no ground truth annotations are involved. Deep learning-based state-of-the-art methods for LiDAR MOS strongly depend on annotated ground truth data, which is expensive to obtain and scarce in existence. To close this gap in the stationary setting, we propose a novel 4D LiDAR representation based on multivariate time series that relaxes the problem of unsupervised MOS to a time series clustering problem. More specifically, we propose modeling the change in occupancy of a voxel by a multivariate occupancy time series (MOTS), which captures spatio-temporal occupancy changes on the voxel level and its surrounding neighborhood. To perform unsupervised MOS, we train a neural network in a self-supervised manner to encode MOTS into voxel-level feature representations, which can be partitioned by a clustering algorithm into moving or stationary. Experiments on stationary scenes from the Raw KITTI dataset show that our fully unsupervised approach achieves performance that is comparable to that of supervised state-of-the-art approaches.
translated by 谷歌翻译
Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are limiting as they do not explicitly exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. This design shares computation within each group, in the spatial and temporal dimensions, resulting in reduced encoding time of the video. The video representation is modeled autoregressively, with networks fit on a current group initialized using weights from the previous group's model. To further enhance efficiency, we perform quantization of the network parameters during training, requiring no post-hoc pruning or quantization. When compared with previous works on the benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70 (in terms of PSNR) and the encoding speed by 12X, while maintaining the same compression rate. In contrast to prior video INR works which struggle with larger resolution and longer videos, we show that our algorithm is highly flexible and scales naturally due to its patch-wise and autoregressive designs. Moreover, our method achieves variable bitrate compression by adapting to videos with varying inter-frame motion. NIRVANA achieves 6X decoding speed and scales well with more GPUs, making it practical for various deployment scenarios.
translated by 谷歌翻译
This work proposes a view of probability as a relative measure rather than an absolute one. To demonstrate this concept, we focus on finite outcome spaces and develop three fundamental axioms that establish requirements for relative probability functions. We then provide a library of examples of these functions and a system for composing them. Additionally, we discuss a relative version of Bayesian inference and its digital implementation. Finally, we prove the topological closure of the relative probability space, highlighting its ability to preserve information under limits.
translated by 谷歌翻译
Deep spiking neural networks (SNNs) offer the promise of low-power artificial intelligence. However, training deep SNNs from scratch or converting deep artificial neural networks to SNNs without loss of performance has been a challenge. Here we propose an exact mapping from a network with Rectified Linear Units (ReLUs) to an SNN that fires exactly one spike per neuron. For our constructive proof, we assume that an arbitrary multi-layer ReLU network with or without convolutional layers, batch normalization and max pooling layers was trained to high performance on some training set. Furthermore, we assume that we have access to a representative example of input data used during training and to the exact parameters (weights and biases) of the trained ReLU network. The mapping from deep ReLU networks to SNNs causes zero percent drop in accuracy on CIFAR10, CIFAR100 and the ImageNet-like data sets Places365 and PASS. More generally our work shows that an arbitrary deep ReLU network can be replaced by an energy-efficient single-spike neural network without any loss of performance.
translated by 谷歌翻译
Diffusion models have achieved justifiable popularity by attaining state-of-the-art performance in generating realistic objects from seemingly arbitrarily complex data distributions, including when conditioning generation on labels. Unfortunately, however, their iterative nature renders them very computationally inefficient during the sampling process. For the multi-class conditional generation problem, we propose a novel, structurally unique framework of diffusion models which are hierarchically branched according to the inherent relationships between classes. In this work, we demonstrate that branched diffusion models offer major improvements in efficiently generating samples from multiple classes. We also showcase several other advantages of branched diffusion models, including ease of extension to novel classes in a continual-learning setting, and a unique interpretability that offers insight into these generative models. Branched diffusion models represent an alternative paradigm to their traditional linear counterparts, and can have large impacts in how we use diffusion models for efficient generation, online learning, and scientific discovery.
translated by 谷歌翻译
Models of sensory processing and learning in the cortex need to efficiently assign credit to synapses in all areas. In deep learning, a known solution is error backpropagation, which however requires biologically implausible weight transport from feed-forward to feedback paths. We introduce Phaseless Alignment Learning (PAL), a bio-plausible method to learn efficient feedback weights in layered cortical hierarchies. This is achieved by exploiting the noise naturally found in biophysical systems as an additional carrier of information. In our dynamical system, all weights are learned simultaneously with always-on plasticity and using only information locally available to the synapses. Our method is completely phase-free (no forward and backward passes or phased learning) and allows for efficient error propagation across multi-layer cortical hierarchies, while maintaining biologically plausible signal transport and learning. Our method is applicable to a wide class of models and improves on previously known biologically plausible ways of credit assignment: compared to random synaptic feedback, it can solve complex tasks with less neurons and learn more useful latent representations. We demonstrate this on various classification tasks using a cortical microcircuit model with prospective coding.
translated by 谷歌翻译
Search algorithms for the bandit problems are applicable in materials discovery. However, the objectives of the conventional bandit problem are different from those of materials discovery. The conventional bandit problem aims to maximize the total rewards, whereas materials discovery aims to achieve breakthroughs in material properties. The max K-armed bandit (MKB) problem, which aims to acquire the single best reward, matches with the discovery tasks better than the conventional bandit. Thus, here, we propose a search algorithm for materials discovery based on the MKB problem using a pseudo-value of the upper confidence bound of expected improvement of the best reward. This approach is pseudo-guaranteed to be asymptotic oracles that do not depends on the time horizon. In addition, compared with other MKB algorithms, the proposed algorithm has only one hyperparameter, which is advantageous in materials discovery. We applied the proposed algorithm to synthetic problems and molecular-design demonstrations using a Monte Carlo tree search. According to the results, the proposed algorithm stably outperformed other bandit algorithms in the late stage of the search process when the optimal arm of the MKB could not be determined based on its expectation reward.
translated by 谷歌翻译